# 1. Network Layering

Previously I suggested these two layers of networks (I have renamed them here):

## 1.1 Sensor-space Sequence Network

<pre>
+---------------------+  +------------+
|      sensor_t1      |  |  motor_t0  |
+---------------------+  +------------+
               ^           ^
         +---------------------+
         |       hidden_t0     |
         +---------------------+
               ^           ^
+---------------------+  +------------+
|      sensor_t0      |  |  motor_t0  |
+---------------------+  +------------+
</pre>

## 1.2 Hidden-space Sequence Network

<pre>
           +---------------------+  
           |      hidden_t1      |  
           +---------------------+  
                      ^
             +-----------------+
             |   hidden2_t0    |
             +-----------------+
                 ^           ^
+---------------------+  +---------------------+ 
|      hidden_t0      |  |      hidden_goal    | 
+---------------------+  +---------------------+ 
</pre>

Comparing these two networks, one notices some symmetry:

* both take state and control, and produces next state
  * hidden_t0 plays the role of sensor_t0 in the Hidden-space network
  * hidden_goal plays the role of motor_t0 in the Hidden-space network
  
This also suggests a slight variation of the Hidden-space network:

<pre>
  +---------------------+  +---------------------+  
  |      hidden_t1      |  |      hidden_goal    |  
  +---------------------+  +---------------------+  
                 ^           ^
             +-----------------+
             |   hidden2_t0    |
             +-----------------+
                 ^           ^
+---------------------+  +---------------------+ 
|      hidden_t0      |  |      hidden_goal    | 
+---------------------+  +---------------------+ 
</pre>

What does that give you?

* complete symmetry (who cares?)
* the possibility of doing what we did with Sensor-space Network to the Hidden-space Network
  * that would give us the possibility of sequencing goals
  * and that is repeatable (eg, more networks doing higher level planning)

Note: to use at a higher-level Hidden-Hidden-Space Sequence Network we will need a goal-no-op to get started.

# 2. Training

In [1]:
from IPython.display import display, HTML
import theano.tensor as T
import numpy as np
import sys
sys.path.append("..")

In [2]:
from discover import *

Using TensorFlow backend.


We import the discover program and load the experiment we are interested in:

In [3]:
FLAGS.directory = "../results" # Where to save/restore files
FLAGS.mode = 'test' # Should be 'wander' or 'test'
FLAGS.num_steps = 5000 # Number of steps to wander and learn
FLAGS.num_hiddens = 10 # Number of hidden units in model

In [4]:
gd = GoalDiscovery()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
g_in (InputLayer)                (None, 10)            0                                            
____________________________________________________________________________________________________
s_in (InputLayer)                (None, 19)            0                                            
____________________________________________________________________________________________________
m_in (InputLayer)                (None, 2)             0                                            
____________________________________________________________________________________________________
c_in (InputLayer)                (None, 10)            0                                            
___________________________________________________________________________________________

  x = merge([g_in, s_in, m_in, c_in], mode='concat')
  name=name)
  self.model = Model(input=[g_in, s_in, m_in, c_in], output=[g_out, s_out, m_out])
  output=[self.model.get_layer('h').output])


In [5]:
gd.restore_model()


Restoring weights


Restoring goals


Restoring history



In [6]:
log = gd.read_log()

Rendering simulator images...


In [7]:
from conx import Network

The step-wise model is straightforward, except for an initial sensor_t0 + motor-noop -> sensor_t0 step. That is a no-motion motor action that gives the same sensor readings (an identity function, of sorts). That we be explained more fully in the sequence model.

In [8]:
goalset = [10, 29, 33, 39, 40, 59, 68]

In [9]:
def build_stepwise_dataset(*goals):
    """
    Given sensor[t0] + motor[t0] -> sensor[t1] + motor[t0]
    """
    if len(goals) == 0:
        goals = range(len(log["goals"]))
    data = []
    for step in [log["goals"][goal] for goal in goals]: # for each step goal created
        # add the motor no-op
        sensor_t0 = gd.history[step - gd.recall_steps]['sensors'][0]
        motor = np.array([0.0, 0.0])
        # identity, noop, don't move
        data.append([np.concatenate([sensor_t0, (motor + 1)/2.0 ]), 
                     np.concatenate([sensor_t0, (motor + 1)/2.0])])
        for j in range(-gd.recall_steps, 2, 1):
            sensor_t0 = gd.history[step + j]['sensors'][0]
            motor = gd.history[step + j]['motors'][0]
            sensor_t1 = gd.history[step + j + 1]['sensors'][0]
            data.append([np.concatenate([sensor_t0, (motor + 1)/2.0 ]), 
                         np.concatenate([sensor_t1, (motor + 1)/2.0])])
    return data

In [10]:
motor_size = 2
sensor_size = 19
hidden_size = 25
stepwise = Network(sensor_size + motor_size, hidden_size, sensor_size + motor_size, 
                    epsilon=0.1, momentum=0.1)

In [11]:
stepwise_dataset = build_stepwise_dataset(*goalset)
stepwise.set_inputs(stepwise_dataset)

In [12]:
stepwise.load("stepwise.net")

In [13]:
stepwise_dict = {}
for inputs, targets in stepwise_dataset:
    hidden = stepwise.layer[0].propagate(inputs)
    stepwise_dict[tuple(hidden)] = targets

# 5. The New Hidden-space Sequence Network

Now that the single-step model is trained, we can use its hidden layer representations in the next model, the sequence network.

In [14]:
hidden2_size = 50
sequence = Network(hidden_size * 2, 
                   hidden2_size, 
                   hidden_size + hidden_size, 
                   epsilon=0.1, momentum=0.1) # hidden[sensors_t0, no-motor-op], fixed goal-hidden, next-hidden)

Again, because this is a feedforward network, we can build a dataset and training on each step independently. 

Note that we need a sensor + noop motor action to get started. That is, we know what our sensors are, but because we need an initial hidden-layer representation, we use a motor-noop (the don't move motor action).

In [15]:
def build_sequence_dataset(*goals):
    """
    hidden[initial_sensor + noop_motor] + hidden[goal] -> hidden[sensor_t1 + motor2]
    hidden[sensor_t0 + motor1] + hidden[goal] -> hidden[sensor_t1 + motor2]
    """
    global sequence_dict
    sequence_dict = {}
    if len(goals) == 0:
        goals = range(len(log["goals"]))
    data = []
    for step in [log["goals"][goal] for goal in goals]: # for each step goal created
        # get the hidden[goal + last motor]
        sensor_goal = gd.history[step + 1]['sensors'][0]
        motor1 = gd.history[step + 1]['motors'][0]
        hidden_goal = stepwise.layer[0].propagate(np.concatenate([sensor_goal, (motor1 + 1)/2.0 ]))
        # add the hidden[initial sensors + motor no-op]
        initial_sensor = gd.history[step - gd.recall_steps]['sensors'][0]
        noop_motor = np.array([0.0, 0.0])
        hidden_noop = stepwise.layer[0].propagate(np.concatenate([initial_sensor, (noop_motor + 1)/2.0 ]))
        # First step:
        sensor_t0 = gd.history[step - gd.recall_steps]['sensors'][0]
        motor_t0 = gd.history[step - gd.recall_steps]['motors'][0]
        hidden_t0 = stepwise.layer[0].propagate(np.concatenate([sensor_t0, (motor_t0 + 1)/2.0 ]))
        # learn on that:
        data.append([np.concatenate([hidden_noop, hidden_goal]), 
                     np.concatenate([hidden_t0, hidden_goal])])
        sequence_dict[tuple(data[-1][0])] = data[-1][1]
        # now, start sequence:
        for j in range(-gd.recall_steps, 1, 1):
            # next hidden, motor:
            sensor_t1 = gd.history[step + j + 1]['sensors'][0]
            motor_t1 = gd.history[step + j + 1]['motors'][0]
            hidden_t1 = stepwise.layer[0].propagate(np.concatenate([sensor_t1, (motor_t1 + 1)/2.0 ]))
            data.append([np.concatenate([hidden_t0, hidden_goal]), 
                         np.concatenate([hidden_t1, hidden_goal])])
            sequence_dict[tuple(data[-1][0])] = data[-1][1]
            hidden_t0 = hidden_t1
        if list(sensor_goal) != list(sensor_t1) or list(hidden_goal) != list(hidden_t1):
            print("last step is not goal!")
            stepwise.pp("hiden_t0  :", hidden_t0)
            stepwise.pp("hiden_goal:", hidden_goal)
            break
    return data

In [16]:
sequence_dataset = build_sequence_dataset(*goalset)

Total training input/target pairs:

In [17]:
len(sequence_dataset)

84

We can train on some, or all patterns. For this experiment, I trained on one, then two, then a few more, then all.

In [18]:
sequence.set_inputs(sequence_dataset)

In [19]:
sequence.train(report_rate=5)

--------------------------------------------------
Training for max trails: 5000 ...
Epoch: 0 TSS error: 814.633727742 %correct: 0.0
Epoch: 5 TSS error: 56.8148474386 %correct: 0.0
Epoch: 10 TSS error: 36.9527065065 %correct: 0.0
Epoch: 15 TSS error: 32.7776544067 %correct: 1.1904761904761905
Epoch: 20 TSS error: 29.7060240751 %correct: 0.0
Epoch: 25 TSS error: 30.7119705735 %correct: 0.0
Epoch: 30 TSS error: 26.514936533 %correct: 1.1904761904761905
Epoch: 35 TSS error: 23.8863498698 %correct: 2.380952380952381
Epoch: 40 TSS error: 22.5074910913 %correct: 2.380952380952381
Epoch: 45 TSS error: 21.9453712831 %correct: 10.714285714285714
Epoch: 50 TSS error: 22.0101809933 %correct: 5.952380952380952
Epoch: 55 TSS error: 22.2418654038 %correct: 11.904761904761903
Epoch: 60 TSS error: 19.35311847 %correct: 5.952380952380952
Epoch: 65 TSS error: 18.8520892298 %correct: 10.714285714285714
Epoch: 70 TSS error: 19.3400949959 %correct: 9.523809523809524
Epoch: 75 TSS error: 18.0429622231 %corr

Epoch: 695 TSS error: 8.07605545906 %correct: 48.80952380952381
Epoch: 700 TSS error: 8.2656689451 %correct: 50.0
Epoch: 705 TSS error: 8.48305435163 %correct: 52.38095238095239
Epoch: 710 TSS error: 7.75197432309 %correct: 58.333333333333336
Epoch: 715 TSS error: 8.18026162379 %correct: 51.19047619047619
Epoch: 720 TSS error: 7.77350096772 %correct: 53.57142857142857
Epoch: 725 TSS error: 8.13232272679 %correct: 46.42857142857143
Epoch: 730 TSS error: 7.67821076901 %correct: 57.14285714285714
Epoch: 735 TSS error: 8.52742635269 %correct: 44.047619047619044
Epoch: 740 TSS error: 7.51726908028 %correct: 51.19047619047619
Epoch: 745 TSS error: 8.72989657999 %correct: 35.714285714285715
Epoch: 750 TSS error: 7.53896805679 %correct: 53.57142857142857
Epoch: 755 TSS error: 7.71742207165 %correct: 48.80952380952381
Epoch: 760 TSS error: 7.80365343862 %correct: 51.19047619047619
Epoch: 765 TSS error: 7.40527082607 %correct: 51.19047619047619
Epoch: 770 TSS error: 9.06823282179 %correct: 39.28

Epoch: 1370 TSS error: 6.87658352491 %correct: 51.19047619047619
Epoch: 1375 TSS error: 6.1802057406 %correct: 61.904761904761905
Epoch: 1380 TSS error: 6.48105961825 %correct: 58.333333333333336
Epoch: 1385 TSS error: 6.47878410387 %correct: 58.333333333333336
Epoch: 1390 TSS error: 6.17205347812 %correct: 64.28571428571429
Epoch: 1395 TSS error: 6.80108489782 %correct: 52.38095238095239
Epoch: 1400 TSS error: 7.16991885305 %correct: 50.0
Epoch: 1405 TSS error: 6.18932713975 %correct: 63.095238095238095
Epoch: 1410 TSS error: 6.40827355461 %correct: 59.523809523809526
Epoch: 1415 TSS error: 7.68036171917 %correct: 45.23809523809524
Epoch: 1420 TSS error: 5.99534632062 %correct: 61.904761904761905
Epoch: 1425 TSS error: 6.8417815802 %correct: 58.333333333333336
Epoch: 1430 TSS error: 6.45706189689 %correct: 60.71428571428571
Epoch: 1435 TSS error: 6.16781749054 %correct: 61.904761904761905
Epoch: 1440 TSS error: 6.05424693645 %correct: 63.095238095238095
Epoch: 1445 TSS error: 6.097169

Epoch: 2020 TSS error: 5.9096438707 %correct: 61.904761904761905
Epoch: 2025 TSS error: 6.09631849336 %correct: 63.095238095238095
Epoch: 2030 TSS error: 5.91173488192 %correct: 61.904761904761905
Epoch: 2035 TSS error: 5.79093016952 %correct: 65.47619047619048
Epoch: 2040 TSS error: 5.59522915712 %correct: 69.04761904761905
Epoch: 2045 TSS error: 5.85719574663 %correct: 61.904761904761905
Epoch: 2050 TSS error: 5.46599539219 %correct: 66.66666666666666
Epoch: 2055 TSS error: 5.82219070204 %correct: 64.28571428571429
Epoch: 2060 TSS error: 5.60954514153 %correct: 65.47619047619048
Epoch: 2065 TSS error: 5.810848376 %correct: 61.904761904761905
Epoch: 2070 TSS error: 6.85359611601 %correct: 54.761904761904766
Epoch: 2075 TSS error: 5.46026034797 %correct: 67.85714285714286
Epoch: 2080 TSS error: 5.40980629262 %correct: 70.23809523809523
Epoch: 2085 TSS error: 6.09967290999 %correct: 63.095238095238095
Epoch: 2090 TSS error: 5.65830909471 %correct: 71.42857142857143
Epoch: 2095 TSS error

Epoch: 2650 TSS error: 5.21646720455 %correct: 71.42857142857143
Epoch: 2655 TSS error: 5.34753837985 %correct: 70.23809523809523
Epoch: 2660 TSS error: 4.94287263058 %correct: 72.61904761904762
Epoch: 2665 TSS error: 6.16497874079 %correct: 59.523809523809526
Epoch: 2670 TSS error: 4.98070810933 %correct: 72.61904761904762
Epoch: 2675 TSS error: 5.1217783738 %correct: 70.23809523809523
Epoch: 2680 TSS error: 4.97450679612 %correct: 72.61904761904762
Epoch: 2685 TSS error: 5.88494952244 %correct: 63.095238095238095
Epoch: 2690 TSS error: 5.25371771775 %correct: 75.0
Epoch: 2695 TSS error: 5.2888222823 %correct: 70.23809523809523
Epoch: 2700 TSS error: 4.98791077785 %correct: 72.61904761904762
Epoch: 2705 TSS error: 5.04411515219 %correct: 69.04761904761905
Epoch: 2710 TSS error: 5.21828584983 %correct: 73.80952380952381
Epoch: 2715 TSS error: 6.30227120766 %correct: 58.333333333333336
Epoch: 2720 TSS error: 5.06479089639 %correct: 69.04761904761905
Epoch: 2725 TSS error: 5.03251415297 

Epoch: 3540 TSS error: 4.95737137357 %correct: 67.85714285714286
Epoch: 3545 TSS error: 4.76843434566 %correct: 75.0
Epoch: 3550 TSS error: 4.93397314179 %correct: 73.80952380952381
Epoch: 3555 TSS error: 4.81439021825 %correct: 75.0
Epoch: 3560 TSS error: 4.96863807248 %correct: 70.23809523809523
Epoch: 3565 TSS error: 4.71910620826 %correct: 75.0
Epoch: 3570 TSS error: 4.95844974674 %correct: 66.66666666666666
Epoch: 3575 TSS error: 5.57029390155 %correct: 60.71428571428571
Epoch: 3580 TSS error: 4.73476449715 %correct: 72.61904761904762
Epoch: 3585 TSS error: 4.89014177409 %correct: 70.23809523809523
Epoch: 3590 TSS error: 5.0043055811 %correct: 70.23809523809523
Epoch: 3595 TSS error: 5.0474026913 %correct: 72.61904761904762
Epoch: 3600 TSS error: 4.58487616049 %correct: 76.19047619047619
Epoch: 3605 TSS error: 4.661190923 %correct: 76.19047619047619
Epoch: 3610 TSS error: 4.87228680274 %correct: 73.80952380952381
Epoch: 3615 TSS error: 4.65345517923 %correct: 77.38095238095238
Epo

Epoch: 4375 TSS error: 4.21274209437 %correct: 76.19047619047619
Epoch: 4380 TSS error: 4.25929826908 %correct: 77.38095238095238
Epoch: 4385 TSS error: 4.61352781146 %correct: 75.0
Epoch: 4390 TSS error: 4.35885085055 %correct: 72.61904761904762
Epoch: 4395 TSS error: 4.65688086554 %correct: 72.61904761904762
Epoch: 4400 TSS error: 4.26681793802 %correct: 77.38095238095238
Epoch: 4405 TSS error: 4.75559939803 %correct: 73.80952380952381
Epoch: 4410 TSS error: 4.21538273269 %correct: 77.38095238095238
Epoch: 4415 TSS error: 4.46131687327 %correct: 76.19047619047619
Epoch: 4420 TSS error: 4.20329642351 %correct: 76.19047619047619
Epoch: 4425 TSS error: 4.33226839035 %correct: 76.19047619047619
Epoch: 4430 TSS error: 4.47571140087 %correct: 76.19047619047619
Epoch: 4435 TSS error: 4.4923802325 %correct: 72.61904761904762
Epoch: 4440 TSS error: 4.87176213122 %correct: 73.80952380952381
Epoch: 4445 TSS error: 4.7511275374 %correct: 72.61904761904762
Epoch: 4450 TSS error: 4.63007028272 %co

This is a harder problem than the single-step network. 

Does it work well enough to move the robot around?

Note: datasets may be shuffled, so let's rebuild:

In [20]:
stepwise_dataset = build_stepwise_dataset(*goalset)
sequence_dataset = build_sequence_dataset(*goalset)

In [26]:
# Test learning:
myseq = 0
for seq in goalset:
    canvas = Canvas((200, 200))
    gd.robot.useTrail = True
    gd.robot.display["trail"] = 1
    gd.robot.display["body"] = 0
    gd.robot.trail[:] = []
    hidden_goal = sequence_dataset[myseq * 12][0][hidden_size:]
    # put robot at initial pose:
    pose = log["poses"][log["goals"][seq] - gd.recall_steps]
    gd.robot.setPose(*pose)
    gd.robot.stall = log["stalls"][log["goals"][seq] - gd.recall_steps]
    # get sensors:
    sensor_t0 = gd.read_sensors()[0]
    # get hidden_t0:
    motor_t0 = np.array([0, 0])
    hidden = stepwise.layer[0].propagate(np.concatenate([sensor_t0, (motor_t0 + 1.0)/2.0]))
    h1 = sequence.propagate(np.concatenate([hidden, hidden_goal]))[:hidden_size]
    if list(hidden_goal) != list(sequence_dataset[myseq * 12][0][hidden_size:]):
        print("hidden_goal is wrong wrong!")
        break        
    if list(hidden) != list(sequence_dataset[myseq * 12][0][:hidden_size]):
        print("initial hidden wrong!")
        break
    for i in range(len(sequence_dataset[myseq * 12:myseq * 12 + 12])):
        motor_output = stepwise.layer[1].propagate(h1)[-2:]
        if i == 0:
            # don't really move, that should be no-op
            motor_output = np.array([0.5, 0.5])
        print(seq, i, stepwise_dataset[myseq * 13 + i][0][-2:], motor_output)
        motor_output = motor_output * 2.0 - 1.0
        gd.robot.move(*motor_output)
        gd.sim.step()
        sensor_t0 = gd.read_sensors()[0]
        hidden = stepwise.layer[0].propagate(np.concatenate([sensor_t0, (motor_output + 1.0)/2.0]))
        h1 = sequence.propagate(np.concatenate([hidden, hidden_goal]))[:hidden_size]
    myseq += 1
    step = log["goals"][seq]
    gd.sim.draw(canvas)
    training_canvas = gd.render_step_behavior(step - 10, step + 2)
    print("Goal", seq)
    print("Training (left), Learned (right)")
    html = HTML("<span>%s %s</span>" % (training_canvas.render(), canvas.render()))
    display(html)
        

10 0 [ 0.5  0.5] [ 0.5  0.5]
10 1 [ 0.77100438  0.68047352] [ 0.39333636  0.40273501]
10 2 [ 0.77100438  0.68047352] [ 0.46757761  0.45222217]
10 3 [ 0.38208515  0.5904439 ] [ 0.44673756  0.44244263]
10 4 [ 0.38208515  0.5904439 ] [ 0.45596821  0.45659187]
10 5 [ 0.38208515  0.5904439 ] [ 0.45104118  0.46808372]
10 6 [ 0.38208515  0.5904439 ] [ 0.44151444  0.4783772 ]
10 7 [ 0.38208515  0.5904439 ] [ 0.43041205  0.48969324]
10 8 [ 0.38208515  0.5904439 ] [ 0.41727149  0.50322638]
10 9 [ 0.37585798  0.94142052] [ 0.40172195  0.52057243]
10 10 [ 0.37585798  0.94142052] [ 0.38431697  0.54517382]
10 11 [ 0.37585798  0.94142052] [ 0.36690623  0.58379006]
Rendering simulator images...
Goal 10
Training (left), Learned (right)


29 0 [ 0.5  0.5] [ 0.5  0.5]
29 1 [ 0.5520867   0.16769625] [ 0.45435133  0.09275374]
29 2 [ 0.5520867   0.16769625] [ 0.51409493  0.20855017]
29 3 [ 0.5520867   0.16769625] [ 0.57487312  0.31269688]
29 4 [ 0.07246636  0.52682716] [ 0.63941207  0.62056541]
29 5 [ 0.5520867   0.16769625] [ 0.67899269  0.45748875]
29 6 [ 0.5520867   0.16769625] [ 0.62004138  0.5445315 ]
29 7 [ 0.5520867   0.16769625] [ 0.69103839  0.52384158]
29 8 [ 0.5520867   0.16769625] [ 0.62722454  0.50729583]
29 9 [ 0.5520867   0.16769625] [ 0.67828294  0.54774527]
29 10 [ 0.77038251  0.93194393] [ 0.64315743  0.50005556]
29 11 [ 0.5520867   0.16769625] [ 0.6636294   0.54580704]
Rendering simulator images...
Goal 29
Training (left), Learned (right)


33 0 [ 0.5  0.5] [ 0.5  0.5]
33 1 [ 0.11409606  0.31777896] [ 0.09995381  0.27184658]
33 2 [ 0.11409606  0.31777896] [ 0.19130536  0.48945677]
33 3 [ 0.11409606  0.31777896] [ 0.3169312   0.78134909]
33 4 [ 0.44436408  0.89046775] [ 0.50260066  0.90876242]
33 5 [ 0.44436408  0.89046775] [ 0.50892037  0.90023116]
33 6 [ 0.44436408  0.89046775] [ 0.42047922  0.83479939]
33 7 [ 0.44436408  0.89046775] [ 0.19699095  0.59928818]
33 8 [ 0.44436408  0.89046775] [ 0.12459615  0.59456604]
33 9 [ 0.44436408  0.89046775] [ 0.16039766  0.58069629]
33 10 [ 0.03022118  0.03876438] [ 0.18978931  0.48285501]
33 11 [ 0.44436408  0.89046775] [ 0.24375137  0.6317246 ]
Rendering simulator images...
Goal 33
Training (left), Learned (right)


39 0 [ 0.5  0.5] [ 0.5  0.5]
39 1 [ 0.62316554  0.78899734] [ 0.61230623  0.79823339]
39 2 [ 0.62316554  0.78899734] [ 0.60315968  0.78635067]
39 3 [ 0.62316554  0.78899734] [ 0.62295915  0.82599304]
39 4 [ 0.62316554  0.78899734] [ 0.693436    0.88717461]
39 5 [ 0.62316554  0.78899734] [ 0.78448312  0.94049974]
39 6 [ 0.84265506  0.99937114] [ 0.84770109  0.96769018]
39 7 [ 0.84265506  0.99937114] [ 0.86849825  0.97680145]
39 8 [ 0.84265506  0.99937114] [ 0.90187779  0.9760923 ]
39 9 [ 0.84265506  0.99937114] [ 0.8956465   0.96900454]
39 10 [ 0.84265506  0.99937114] [ 0.79443787  0.90267541]
39 11 [ 0.84265506  0.99937114] [ 0.7342715  0.5849787]
Rendering simulator images...
Goal 39
Training (left), Learned (right)


40 0 [ 0.5  0.5] [ 0.5  0.5]
40 1 [ 0.65073144  0.38975265] [ 0.62305375  0.44137096]
40 2 [ 0.65073144  0.38975265] [ 0.60812927  0.4798707 ]
40 3 [ 0.65073144  0.38975265] [ 0.58151032  0.52298586]
40 4 [ 0.65073144  0.38975265] [ 0.56761776  0.53830204]
40 5 [ 0.33272589  0.90482181] [ 0.5647565   0.53886733]
40 6 [ 0.33272589  0.90482181] [ 0.56521938  0.53391882]
40 7 [ 0.33272589  0.90482181] [ 0.56576713  0.52985618]
40 8 [ 0.33272589  0.90482181] [ 0.56493504  0.52892569]
40 9 [ 0.33272589  0.90482181] [ 0.56310031  0.52961834]
40 10 [ 0.33272589  0.90482181] [ 0.5612353   0.52996607]
40 11 [ 0.17789765  0.86387146] [ 0.55976088  0.52945103]
Rendering simulator images...
Goal 40
Training (left), Learned (right)


59 0 [ 0.5  0.5] [ 0.5  0.5]
59 1 [ 0.09984668  0.17584192] [ 0.10607549  0.16119786]
59 2 [ 0.09984668  0.17584192] [ 0.1131802   0.36717298]
59 3 [ 0.09667999  0.52738884] [ 0.10398088  0.46397506]
59 4 [ 0.09667999  0.52738884] [ 0.09920747  0.53879271]
59 5 [ 0.09667999  0.52738884] [ 0.08617281  0.53160428]
59 6 [ 0.09667999  0.52738884] [ 0.17454375  0.44172291]
59 7 [ 0.09667999  0.52738884] [ 0.13850684  0.49197482]
59 8 [ 0.09667999  0.52738884] [ 0.15825477  0.45669258]
59 9 [ 0.54958553  0.33401549] [ 0.14755484  0.48518995]
59 10 [ 0.09667999  0.52738884] [ 0.15193792  0.45753723]
59 11 [ 0.54958553  0.33401549] [ 0.15206618  0.48918803]
Rendering simulator images...
Goal 59
Training (left), Learned (right)


68 0 [ 0.5  0.5] [ 0.5  0.5]
68 1 [ 0.97905087  0.23206347] [ 0.94239111  0.24044641]
68 2 [ 0.97905087  0.23206347] [ 0.942881    0.24661398]
68 3 [ 0.97905087  0.23206347] [ 0.94981036  0.23773567]
68 4 [ 0.97905087  0.23206347] [ 0.95621023  0.22606542]
68 5 [ 0.97905087  0.23206347] [ 0.96689223  0.25878462]
68 6 [ 0.97905087  0.23206347] [ 0.94756427  0.62735246]
68 7 [ 0.98767049  0.73186211] [ 0.96894001  0.41272582]
68 8 [ 0.98767049  0.73186211] [ 0.92388464  0.86719675]
68 9 [ 0.98767049  0.73186211] [ 0.97706531  0.29765194]
68 10 [ 0.98767049  0.73186211] [ 0.93262447  0.92283144]
68 11 [ 0.98767049  0.73186211] [ 0.95648971  0.46395963]
Rendering simulator images...
Goal 68
Training (left), Learned (right)


Learns as well as the last experiment. 